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ABSTRACT 

A high-breakdown estimator is a robust statistic that 
can withstand a large amount of contaminated data. In linear 
regression, high-breakdown estimators can detect outliers and 
distinguish between good and bad leverage points. This paper 
summarizes the case for high-breakdown regression and emphasizes the 
least quartile difference estimator (LQD) proposed by C. Croux, P. J. 
Rousseeuw, and 0. Hossjer (1994). This regression method examines the 
absolute differences between every pair of residuals and minimizes 
the first quartile of these differences with an adjustment for 
degrees of freedom. LQD is affine equivalent and has a 50% breakdown 
point, the highest possible. Its asymptotic efficiency is about 67% 
of least-squares, so LQD should be able to deal with anomalous 
observations and should also perform well when the data are not 
contaminated. Although interest in the approach is growing, software 
is still not widely available. An appendix presents a BASIC computer 
program for one type of high-breakdown regression. (Contains 17 
references.) (SLD) 
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Abstract. This paper is a concise presentation of high- 
breakdown regression with emphasis on the least 
quartile difference (LQD) estimator proposed by 
Croux, Rousseeuw, and Hossjer. The robustness 
of high-breakdown regression is discussed, and 
a Basic program for LQD is provided. 
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High-breakdown Regression by Least Quartile Differences 

Introduction 

A high-breakdown estimator is a robust statistic which can withstand a 
large amount of contaminated data. In linear regression, high-breakdown 
estimators can detect outliers and distinguish between good and bad leverage 
points. Although most high-breakdown estimators are based on resampling 
schemes which require extensive computation, recent improvements in 
hardware and algorithms have made these robust methods practical. 

This paper summarizes the case for high-breakdown regression. Although 
interest in the topic is growing and the literature is extensive, software is still not 
widely available. Some public-domain sources are mentioned, and an appendix 
contains a Basic program for one type of high-breakdown regression. 

The breakdown point 

If a few bad observations can induce an arbitrarily large bias in an 
estimator, the estimator is said to have a low breakdown point (Rousseeuw and 
Leroy 1987, chapter 1). For instance, in a sample of n data, the breakdown point 
of a least-squares (LS) regression is 1/n. Just one outlier, if it is bad enough, can 
throw the LS line indefinitely far off target. On the other hand, a robust method 
has a high breakdown point; it can deal with a lot of contamination. The highest 
achievable breakdown point is fifty percent. If more than half the observations 
are contaminated, a linear regression method cannot distinguish the good 
observations from the bad. 

One kind of contamination is a regression outlier, a stray observation on 
the dependent variable. Another kind of contamination is a bad leverage point, a 
stray observation on an independent variable. A high-breakdown method is 
unaffected by regression outliers and bad leverage points but does not ignore 
good leverage points, those observations which lie apart from the other data but 
are still close to the regression line. Good leverage points improve the precision 
of the estimates. 

In conjunction with other criteria, the breakdown concept can be used to 
evaluate the sturdiness of various statistics. The sample median is very robust; it 
attains the maximum breakdown point. The LI norm, a multivariate version of the 
median, minimizes the sum of absolute residuals rather than the sum of squared 
residuals (Dodge 1987, 1992). The LI norm is not vulnerable to regression 
outliers but is vulnerable to bad leverage points, so it shares the low breakdown 
point of LS, as do the M-estimators of Huber (1981). The generalized M- 
estimators (Hampel et al. 1986, chapter 6) can cope with bad leverage points. 
Unfortunately, their breakdown point is inversely related to the number of 
variables in the model (Rousseeuw and Leroy 1987, chapter 6), so they may be 
affected by aberrant observations in high-dimensional regression problems. 

The LS residuals may not be reliable indicators of outliers. If the 
regression line has broken down, some good observations will have large LS 
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residuals and some bad observations will have small residuals. The standard 
tests for heteroscedasticity based on the LS residuals can be quite misleading. 
The same remark applies to classical diagnostics like Cook's distance and the 
"hat" matrix. To end up with a robust estimate, one has to start with a robust 
estimate. 

It is tempting to seek robustness by examining the variables one at a time 
before running the regression. Robust measures of location and dispersion can 
be computed, while a histogram of each variable may suggest a simple 
transformation to reduce skewness. This common-sense approach, involving a 
careful inspection of the data, is certainly worthwhile. However, it is not 
guaranteed to reveal all the outliers and bad leverage points because the 
troublesome data are aberrant with respect to the hypothesized regression line, 
not with respect to a particular variable. 

Of course, some robustness problems are more tractable than others. For 
example, certain models are not subject to bad leverage points. This is trivially 
the case for univariate samples, but it is also true when the independent 
variables in a regression model are deterministic. Examples are simple trends 
and dummy variables, including ANOVA models. The L1 norm or an M-estimator 
is very appropriate in these cases, where the possibility of contamination is 
limited to the dependent variable. 

In summary, a high-breakdown point is a desirable robustness property, 
especially in regression and other multivariate models where leverage points 
can occur. As it happens, combining high breakdown with other good statistical 
properties is not always straightfon/vard. In particular, a linear regression 
procedure should be reasonably efficient compared to LS when the data are free 
of contamination. Moreover, the procedure ought to behave well under linear 
transformations of the variables: the regression coefficients should adjust in an 
obvious way when the dependent variable or an independent variable undergoes 
a shift of origin or a change of scale. LS has this affine equivariance, of course, 
and so should robust regression. 

High-breakdown regression 

Rousseeuw proposed the first high-breakdown, affine-equivariant 
estimator for linear regression (Rousseeuw 1984, Rousseeuw and Leroy 1987, 
chapters 1-3, 5). It is called least median of squares (LMS) because the 
regression line minimizes the median squared residual. The geometric 
interpretation for bivariate regression is instructive. LMS finds the narrowest strip 
that covers at least half the observations. (Narrowness is measured in the 
direction of the dependent variable.) The regression line lies in the middle of the 
strip. In other words, LMS tries to fit the majority of the observations, ignoring the 
aberrant data. This explains its asymptotic breakdown point of fifty percent. . 

When the regression errors are in fact drawn from a well-behaved 
gaussian distribution, LMS suffers from the low efficiency common to many 
robust techniques. There are several proposals for high-breakdown, efficient 
regression estimators. One can use a two-step procedure, in which the LMS 
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residuals are used as weights in a LS regression. Various weighting schemes 
have been discussed (Rousseeuw and Leroy 1987, chapter 3). 

Alternatively, the median squared residual can be replaced by a more 
efficient minimand. Croux, Rousseeuw, and Hossjer (1994) introduced the 
Least Quartile Distance (LQD) estimator. This regression method examines the 
absolute differences between every pair of residuals and minimizes the first 
quartile of those differences (with an adjustment for degrees of freedom). It is 
therefore a multivariate version of the mode, a "nearest neighbor" estimator 
which is unaffected by skewed data. LQD is affine equivariant and has a fifty 
percent breakdown point, the highest possible. Its asymptotic efficiency is about 
67 percent of LS --very high for a robust method. Therefore, LQD should be 
able to deal with anomalous observations and at the same time should perform 
well when the data are not contaminated. 

Like LMS, LQD is computed using a resampling scheme. To estimate k 
regression coefficients, a subsample of k observations is drawn at random from 
the data set. The regression coefficients are calculated; and the residuals are 
computed for the entire sample. The differences between each pair of residuals 
are obtained, and the first quartile of their absolute values is evaluated. This 
process is repeated many times, and the final regression line is based on the 
subsample with the least quartile difference. 

Rousseeuw and Leroy (1987, chapter 5) have discussed the amount of 
resampling needed to obtain a robust estimate with high probability. Some 
guidelines are offered in the Basic program itself. [Rousseeuw (1993) has 
recently proposed an alternative sampling scheme which has not, however, 
been implemented in the Basic program for LQD.] Obviously, LQD regression is 
much more "computer intensive" than LS or even the LI norm. (The latter is a 
linear programming problem.) For n observations, the LQD requires, in every 
subsample, an order statistic of n(n-1)/2 differences between pairs of residuals. 
Even for moderate n, the calculations would be impractical were it not for 
Rousseeuw and Croux's clever algorithm. 

When the LQD regression has flagged several observations as potential 
outliers, the researcher will probably want to scrutinize those data. They will have 
to be discarded, downweighted, or validated and reinstated. Then the 
researcher may choose to apply LS to the revised sample in order to examine 
the usual F-statistic, t-ratios, and so on. Since the sample has undergone a 
complicated "pretest," those diagnostics are no longer strictly valid for classical 
hypothesis tests. However, they may provide a general impression of the model's 
adequacy. 

Nonlinear parameters pose further computational and conceptual issues 
for high-breakdown regression, some of which have been explored by Stromberg 
(1993) and Stromberg and Ruppert (1992). 




5 



5 



Programs for high-breakdown regression 

Appendix A contains a Basic program for LQD. The program is simple but 
serviceable, with few bells and whistles. Users are encouraged to add code to 
enhance the flexibility of input, the labeling of output, the treatment of missing 
data, graphics, and other features. 

The on-line statistical resource STATLIB currently provides Fortran source 
code for several procedures. This author is aware of Rousseeuw and Leroy's 
PROGRESS program for LMS regression and their MINVOL program for 
covariance matrices; Rousseeuw and Croux's Sn and Qn scale estimators: and 
Douglas Hawkins' feasible-solution algorithms (FSA) for regression and 
covariance matrices. Rocke and Woodruff have provided C code for a program 
to estimate high-breakdown covariance matrices. Other public-domain and 
commercial implementations are no doubt available. 

The literature on robust regression includes many well-studied data sets, 
both real and hypothetical, which allow researchers to compare LS, L1 , LMS, 
LQD and other methods. In particular, the monograph by Rousseeuw and Leroy 
(1987) contains numerous examples. 
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Appendix A. A Basic Program for the Least Quartile 
Difference Regression 
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REM 

REM 

REM 

REM 

REM 
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REM 

REM 
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REM 
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REM 

REM 

REM 
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REM 
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REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 



THIS PROGRAM COMPUTES A ROBUST LINEAR REGRESSION CALLED 
THE LEAST QUARTILE DIFFERENCE ESTIMATOR (LQD) . IT WAS 
PROPOSED IN THE FOLLOWING PAPER: 

CHRISTOPHE CROUX, PETER J. ROUSSEEUW, AND OLA 
HOSSJER (1994), "GENERALIZED S -ESTIMATORS, " 

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 

89, 1271-1281. 

THE LQD IS CLOSELY RELATED TO OTHER HIGH-BREAKDOWN, 
COMPUTER- INTENSIVE REGRESSION METHODS LIKE LEAST 
MEDIAN OF SQUARES AND LEAST TRIMMED SQUARES. WHEN THE 
RANDOM ERRORS ARE IN FACT GAUSSIAN, THE LQD IS MORE 
EFFICIENT THAN THESE OTHER HIGH -BREAKDOWN METHODS. 

(ITS ASYMPTOTIC EFFICIENCY IS ABOUT 67 PERCENT 
COMPARED TO LEAST SQUARES AT A GAUSSIAN DISTRIBUTION.) 
LQD RESEMBLES A MODE; IT IS A "NEAREST NEIGHBOR" 
ESTIMATOR WHICH COMPARES THE ABSOLUTE DIFFERENCE 
BETWEEN EVERY PAIR OF RESIDUALS AND MINIMIZES A CERTAIN 
ORDER STATISTIC OF THOSE DIFFERENCES. 

THE CURRENT VERSION OF THE PROGRAM IS DIMENSIONED FOR 
7 INDEPENDENT VARIABLES AND 500 OBSERVATIONS. A CONSTANT 
IS AUTOMATICALLY INCLUDED IN THE REGRESSION EQUATION. 

THE PROGRAM ASSUMES THAT THE ROWS OF THE DATA MATRIX 
ARE OBSERVATIONS, WHILE THE COLUMNS ARE VARIABLES. THE 
LAST COLUMN SHOULD CONTAIN THE DEPENDENT VARIABLE. TO 
MAINTAIN NUMERICAL ACCURACY, THE USER SHOULD SCALE THE 
DATA SO THAT ALL THE VARIABLES HAVE THE SAME ORDER OF 
MAGNITUDE . 

LIKE MOST OTHER HIGH -BREAKDOWN REGRESSION METHODS, 

LQD IS BASED ON A RESAMPLING SCHEME; THE REGRESSION 
LINE IS FIT TO A LARGE NUMBER OF SUBSAMPLES, AND THE 
LINE THAT MINIMIZES A ROBUST CRITERION IS SELECTED. USE 
ENOUGH SUBSAMPLES (ITR) TO PROVIDE VIRTUAL ASSURANCE 
THAT THERE WILL BE SEVERAL UNCONTAMINATED SUBSAMPLES. 

IN THEIR MONOGRAPH, ROBUST REGRESSION AND OUTLIER 
DETECTION (NEW YORK: WILEY, 1987), P. J. ROUSSEEUW AND 
A. M. LEROY DISCUSS THE RESAMPLING ISSUE IN DETAIL AND 
OFFER THE FOLLOWING ROUGH GUIDELINES: 



INDEPENDENT 

VARIABLES 

1 

2 

3 

4 

5 

6 OR MORE 



MINIMUM NUMBER 
OF SUBSAMPLES 
500 
1,000 

1.500 

2,000 

2.500 
3,000 



FOR FURTHER DISCUSSION, THE USER SHOULD CONSULT THE 



REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 

REM 

4002 

4004 

4006 

4008 

4010 

4012 

4015 

4020 

4025 

4030 

4035 

4040 

4045 

4050 

4055 

4060 

4065 

4070 

4075 

4080 

4085 

4090 

4095 

4100 

4110 

4120 

4130 

4135 

4140 

4150 

4160 

4170 

4180 

4185 
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REFERENCES CITED ABOVE. 

THE GAUSS -JORDAN ROUTINE FOR MATRIX INVERSION IS 
ADAPTED FROM W. W. COOLEY AND P. R. LOHNES (1962), 
MULTIVARIATE PROCEDURES FOR THE SOCIAL SCIENCES, 

NEW YORK: WILEY. 

THE FUNCTIONS AND SUBROUTINES IN THIS PROGRAM ARE 
ADAPTED FROM "QN.FOR" -- A FORTRAN PROGRAM FOR 
ROBUST SCALE ESTIMATES WRITTEN BY P. J. ROUSSEEUW 
AND HIS COLLEAGUES. 

DETAILS ON THE QN SCALE ESTIMATE ARE GIVEN IN A 
PAPER BY P. J. ROUSSEEUW AND C. CROUX (1993) : 

"ALTERNATIVES TO THE MEDIAN ABSOLUTE DEVIATION, " 
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 

88, 1273-1283. 

THERE IS NO WARRANTY, EXPRESSED OR IMPLIED, FOR THIS 
PROGRAM. ITS SUITABILITY FOR COMMERCIAL USE OR FOR 
ANY PARTICULAR PURPOSE IS NOT GUARANTEED. 

DEFINT I-N 
DEFDBL C-D 

DECLARE FUNCTION SCALE (X(), N,K) 

DECLARE FUNCTION FMED (A(), N, NH) 

DECLARE FUNCTION WHIMED (A(), IW(), NX) 

DECLARE SUB SORT (A(), N, B()) 

RANDOMIZE TIMER 

PRINT "WHAT IS THE INPUT FILE ?" 

INPUT INFILE$ 

PRINT "WHAT IS THE OUTPUT FILE ?" 

INPUT OUTFILE$ 

OPEN INFILE$ FOR INPUT AS #1 
OPEN OUTFILE$ FOR OUTPUT AS #2 
PRINT "HOW MANY OBSERVATIONS ?" 

INPUT N 

REM THE DEPENDENT VARIABLE MUST BE IN THE LAST COLUMN 
REM OF THE DATA ARRAY. 

PRINT "HOW MANY VARIABLES (INDEPENDENT + DEPENDENT) ?" 
INPUT K 

PRINT "HOW MANY SUBSAMPLES SHOULD BE DRAWN ?" 

INPUT ITR 

DIM VI (500, 8), R(500), BLQD(8), C(8, 8), D(8) 
DIMVD(500), IPIVOT(8), PIVOT(8), INDEX(8, 2), M(8) 
CRITMIN = 1000000# 

PIVMIN = .000001 
RK = K 
RN = N 

REM READ THE DATA 
FOR I = 1 TO N 
FOR J = 1 TO K 
INPUT #1, VI (I, J) 

NEXT J 
NEXT I 

REM INCLUDE A CONSTANT (INTERCEPT) IN THE REGRESSION 
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4190 FOR I = 1 TO N 
4200 VD(I) = VI(I, K) 

4210 VI(I, K) = 1! 

4220 NEXT I 

4235 REM START ITERATIONS; CHOOSE A RANDOM SUBSAMPLE OF 

4240 FOR L = 1 TO ITR 

4250 PRINT "INTERATION L 

4260 FOR I = 1 TO K 

4262 M(I) = INT(RND * N) + 1 

4264 NEXT I 

4266 FOR I = 1 TO K 

4268 FOR J = 1 TO K 

4270 IF I = J GOTO 4274 

4272 IF M(I) = M(J) GOTO 4260 

4274 NEXT J 

4276 NEXT I 

4278 FOR I = 1 TO K 

4280 MI = M(I) 

4282 D(I) = VD(MI) 

4290 FOR J = 1 TO K 
4300 C(I, J) = VI (MI, J) 

4310 NEXT J 
4320 NEXT I 

4325 REM INVERT THE KxK MATRIX 

4330 DETERM = 11 

4340 FOR J = 1 TO K 

4350 IPIVOT(J) = 0 

4360 NEXT J 

4370 FOR I = 1 TO K 

4380 CMAX = 01 

4390 FOR J = 1 TO K 

4400 IF IPIVOT(J) = 1 GOTO 4480 

4410 FOR JJ = 1 TO K 

4420 IF IPIVOT(JJ) = 1 GOTO 4470 

4422 IF IPIVOT(JJ) > 1 GOTO 4890 

4430 IF ABS(CMAX) >= ABS(C(J, K) ) GOTO 4470 

4440 IROW = J 

4450 ICOLUM = JJ 

4460 CMAX = C(J, JJ) 

4470 NEXT JJ 
4480 NEXT J 

4485 IPIVOT( ICOLUM) = I PIVOT (ICOLUM) + 1 

4490 IF IROW = ICOLUM GOTO 4590 

4500 DETERM = -DETERM 

4510 FOR J = 1 TO K 

4520 ZWAP = CdROW, J) 

4530 CdROW, J) = C (ICOLUM, J) 

4540 C (ICOLUM, J) = ZWAP 

4550 NEXT J 

4560 ZWAP = D(IROW) 

4570 D(IROW) = D (ICOLUM) 

4580 D (ICOLUM) = ZWAP 
4590 INDEX (I, 1) = IROW 
4600 INDEX (I, 2) = ICOLUM 




DATA 



10 



4610 

REM 

REM 

4612 

4620 

4630 

4640 

4650 

4660 

4670 

4680 

4690 

4700 

4710 

4720 

4730 

4740 

4750 

4760 

4770 

4780 

4790 

4800 

4810 

4820 

4830 

4840 

4850 

4860 

4870 

4880 

4890 

4900 

4910 

4920 

4930 

4940 

4950 

4970 

REM 

5210 

5220 

5230 

5270 

5280 

5290 

5300 

5310 

5435 

5440 

5450 

5460 

5470 

REM 
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PIVOT (I) = CdCOLUM, ICOLUM) 

IF THE SUBSAMPLE IS SINGULAR, OR NEARLY SO, DISCARD 
IT AND DRAW ANOTHER SUBSAMPLE. 

IF ABS (PIVOT(I) ) < PIVMIN GOTO 4260 
DETERM = DETERM * PIVOT (I) 

CdCOLUM, ICOLUM) = 1! 

FOR J = 1 TO K 

CdCOLUM, J) = CdCOLUM, J) / PIVOT (I) 

NEXT J 

D (ICOLUM) = D (ICOLUM) / PIVOT (I) 

FOR JJ = 1 TO K 

IF JJ = ICOLUM GOTO 4760 

T = C(JJ, ICOLUM) 

C(JJ, ICOLUM) = 0! 

FOR J = 1 TO K 

C(JJ, J) = C(JJ, J) - CdCOLUM, J) * T 
NEXT J 

D(JJ) = D(JJ) - D (ICOLUM) * T 
NEXT JJ 
NEXT I 

FOR I = 1 TO K 
JJ = K + 1 - I 

IF INDEX (JJ, 1) = INDEX (JJ, 2) GOTO 4880 
JROW = INDEX (JJ, 1) 

JCOLUM = INDEX (JJ, 2) 

FOR J = 1 TO K 
ZWAP = C(J, JROW) 

C(J, JROW) = C(J, JCOLUM) 

C(K, JCOLUM) = ZWAP 
NEXT J 
NEXT I 

REM COMPUTE ROBUST RESIDUALS FOR THE ENTIRE SAMPLE 
FOR I = 1 TO N 
SUM =0! 

FOR J = 1 TO K 

SUM = SUM + VI (I, J) * D(J) 

NEXT J 

R(I) = VD(I) - SUM 
NEXT I 

COMPUTE THE ROBUST CRITERION QN. 

CRIT = SCALE (R(), N, K) 

IF CRITMIN <= CRIT GOTO 5300 
CRITMIN = CRIT 
FOR J = 1 TO K 
BLQD(J) = D(J) 

NEXT J 
NEXT L 

PRINT #2, "ROBUST SCALE ESTIMATE ", CRITMIN 

REM PRINT THE LQD REGRESSION COEFFICIENTS 

PRINT #2, "LQD REGRESSION COEFFICIENTS (CONSTANT LAST)" 

FOR J = 1 TO K 

PRINT #2, J, BLQD(J) 

NEXT J 

STANDARDIZE THE ROBUST RESIDUALS AND IDENTIFY POTENTIAL 



REM OUTLIERS . 

5475 PRINT #2, " " 

5477 PRINT #2, "STANDARDIZED ROBUST RESIDUALS >= 2.5" 
5480 FOR I = 1 TO N 

5490 SUM = 0! 

5500 FOR J = 1 TO K 

5510 SUM = SUM + VI (I, J) * BLQD(J) 

5520 NEXT J 

5530 R(I) = (VD{I) - SUM)/CRITMIN 
5540 IF ABS(R{I)) < 2.5 THEN GOTO 5560 
5550 PRINT #2, I, R(I) 

5560 NEXT I 
5570 END 

940 FUNCTION FMED (A{), N, NH) 

945 DEFINT I-N 
950 DIM B(500) 

955 FOR I = 1 TO N 
960 B(I) = A(I) 

965 NEXT I 
980 LL = 1 
990 LR = N 

1000 IF LL >= LR GOTO 1210 
1010 AX = B(NH) 

1020 JNC = LL 
1030 J = LR 

1040 IF JNC > J GOTO 1180 
1050 IF B(JNC) >= AX GOTO 1080 
1060 JNC = JNC + 1 
1070 GOTO 1050 

1080 IF B(J) <= AX GOTO 1110 

1090 J = J - 1 

1100 GOTO 1080 

1110 IF JNC > J GOTO 1170 

1120 WA = B(JNC) 

1130 B(JNC) = B(J) 

1140 B(J) = WA 
1150 JNC = JNC + 1 
1160 J = J - 1 
1170 GOTO 1040 

1180 IF J < NH THEN LL = JNC 
1190 IF NH < JNC THEN LR = J 
1200 GOTO 1000 
1210 FMED = B(NH) 

1220 END FUNCTION 

1500 FUNCTION SCALE (X{), N,K) 

1502 DEFINT I-N 

1503 DIM Y(500), WORK(500), LEFT(500) 

1504 DIM IRIGHT(500), IWEIGHT (500) , IQ(500), IP (500) 
1506 IH = (N + K + 1)\2 

1508 KK = IH * (IH - 1) \ 2 
1510 CALL SORT(X{), N, Y{)) 

1512 FOR I = 1 TO N 




1 + 2 



1514 

1516 

1518 

1520 

1522 

1524 

1526 

1628 

1630 

1632 

1634 

1636 

1638 

1640 

1642 

1644 

1646 

1648 

1650 

1652 

1654 

1656 

1658 

1660 

1662 

1664 

1666 

1668 

1670 

1672 

1674 

1676 

1678 

1680 

1682 

1684 

1686 

1688 

1690 

1692 

1694 

1696 

1698 

1700 

1701 

1702 
1704 
1706 
1708 
1710 
1712 
1714 
1716 
1718 



LEFT (I) = N - 
IRIGHT(I) = N 
NEXT I 
JHELP = N * (N + 1) \ 2 
KNEW = KK + JHELP 
NL = JHELP 
NR = N * N 
IFOUND = 0 
REM 

IF (NR - NL > N) AND (IFOUND = 0) THEN 
J = 1 

FOR I = 2 TO N 

IF LEFT (I) <= IRIGHT(I) THEN 
IWEIGHT(J) = IRIGHT(I) - LEFT (I) + 1 
JHELP = LEFT (I) + IWEIGHT(J) \ 2 
WORK(J) = Y(I) - Y(N + 1 - JHELP) 

J = J + 1 
END IF 
NEXT I 

TRIAL = WHIMED (WORK ( ) , IWEIGHTO, J - 1) 

J = 0 

FOR I = N TO 1 STEP -1 
REM 

IF J < N AND ((Y(I) - Y(N - J) ) < TRIAL) THEN 

J = J + 1 

GOTO 1658 

END IF 

IP (I) = J 

NEXT I 

J = N + 1 

FOR I = 1 TO N 

REM 

IF ((Y(I) - Y(N - J + 2)) > TRIAL) THEN 

J = J - 1 

GOTO 1676 

END IF 

IQ(I) = J 

NEXT I 

ISUMP = 0 

ISUMQ = 0 

FOR I = 1 TO N 

ISUMP = ISUMP + IP (I) 

ISUMQ = ISUMQ + IQ (I) - 1 

NEXT I 

IF KNEW <= ISUMP THEN 
FOR I = 1 TO N 
IRIGHT(I) = IP (I) 

NEXT I 
NR = ISUMP 
ELSE 

IF KNEW > ISUMQ THEN 
FOR I = 1 TO N 
LEFT(I) = IQ(I) 

NEXT I 
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1720 NL = ISUMQ 
1722 ELSE 
1724 QN = TRIAL 
1726 IFOUND = -1 
1728 END IF 
1730 END IF 
1732 GOTO 1630 
1734 END IF 

1736 IF IFOUND = 0 THEN 

1738 J = 1 

1740 FOR I = 2 TO N 

1742 IF LEFT (I) <= IRIGHT(I) THEN 

1744 FOR JJ = LEFT (I) TO IRIGHT(I) 

1746 WORK(J) = Y(I) - Y(N - JJ + 1) 

1748 J = J + 1 

1750 NEXT JJ 

1752 END IF 

1754 NEXT I 



1756 


QN 


= 


FMED (WORK ( ) , 


J 


- 1 , 


KNEW 


1758 


END IF 














1760 


IF 


N 


<: 




9 THEN 








1762 


IF 


N 


= 


2 


THEN 


DN 


= 


.399 




1764 


IF 


N 


= 


3 


THEN 


DN 


= 


.994 




1766 


IF 


N 


= 


4 


THEN 


DN 


= 


.512 




1768 


IF 


N 


= 


5 


THEN 


DN 


= 


.844 




1770 


IF 


N 


= 


6 


THEN 


DN 


= 


.611 




1772 


IF 


N 


= 


7 


THEN 


DN 


= 


.857 




1774 


IF 


N 


= 


8 


THEN 


DN 


= 


.669 




1776 


IF 


N 


= 


9 


THEN 


DN 


= 


.872 




1778 


ELSE 
















1780 


IF 


(N MOD 2) = 


= 1 


THEN DN 


= N , 



1782 IF (N MOD 2) = 0 THEN DN = N / 
1784 END IF 

1786 SCALE = DN * 2.2219 * QN 
1788 END FUNCTION 

570 SUB SORT (A(), N, B()) 

575 DEFINT I-N 

580 DIM JLV(500), JRV(500) 

585 FOR I = 1 TO N 
590 B(I) = A(I) 

595 NEXT I 
600 JSS = 1 
610 JLV(l) = 1 
620 JRV(l) = N 
630 JNDL = JLV(JSS) 

640 JR = JRV(JSS) 

650 JSS = JSS - 1 
660 JNC = JNDL 
670 J = JR 

680 JTWE = (JNDL + JR) \ 2 
690 XX = B(JTWE) 

700 IF B(JNC) >= XX GOTO 730 
710 JNC = JNC + 1 



O 




- NL) 



(N + 1.4) 
(N + 3.8) 



14 



720 GOTO 700 

730 IF XX >= B(J) GOTO 760 

740 J = J - 1 

750 GOTO 730 

760 IF JNC > J GOTO 806 

770 AMM = B(JNC) 

780 B(JNC) = B(J) 

790 B(J) = AMM 

800 JNC = JNC + 1 

804 J = J - 1 

806 IF JNC <= J GOTO 700 

808 IF (J - JNDL) < (JR - JNC) GOTO 822 

810 IF JNDL >= J GOTO 818 

812 JSS = JSS + 1 

814 JLV(JSS) = JNDL 

816 JRV(JSS) = J 

818 JNDL = JNC 

820 GOTO 832 

822 IF JNC >= JR GOTO 830 
824 JSS = JSS + 1 
826 JLV(JSS) = JNC 
828 JRV(JSS) = JR 
830 JR = J 

832 IF JNDL < JR GOTO 660 
834 IF JSS <> 0 GOTO 630 
836 END SXTB 

1900 FUNCTION WHIMED (A(), IW() , NX) 

1902 DEFINT I-N 

1904 DIM ACAND(500) , IWCAND(500) 

1906 NN = NX 

1908 IWTOTAL = 0 

1910 FOR I = 1 TO NN 

1912 IWTOTAL = IWTOTAL + IW(I) 

1914 NEXT I 
1916 IWREST = 0 
1918 REM 

1920 TRIAL = FMED(A(), NN, NN \ 2 + 1) 

1922 IWLEFT = 0 

1924 IWMID = 0 

1926 IWRIGHT = 0 

1928 FOR I = 1 TO NN 

1930 IF A (I) < TRIAL THEN 

1932 IWLEFT = IWLEFT + IW(I) 

1934 ELSE 

1936 IF A (I) > TRIAL THEN 
1938 IWRIGHT = IWRIGHT + IW(I) 

1940 ELSE 

1942 IWMID = IWMID + IW(I) 

1944 END IF 
1946 END IF 
1948 NEXT I 

1950 IF ((2 * IWREST + 2 * IWLEFT) > IWTOTAL) THEN 
1952 KCAND = 0 
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1954 FOR I = 1 TO NN 
1956 IF A (I) < TRIAL THEN 
1958 KCAND = KCAND + 1 
1960 ACAND (KCAND) = A(I) 

1962 IWCAND (KCAND) = IW(I) 

1964 END IF 
1966 NEXT I 
1968 NN = KCAND 
1970 ELSE 

1972 IF ( (2 * IWREST + 2 * IWLEFT + 2 * IWMID) > IWTOTAL) THEN 

1974 WHINED = TRIAL 

1976 GOTO 2014 

1978 ELSE 

1980 KCAND = 0 

1982 FOR I = 1 TO NN 

1984 IF A (I) > TRIAL THEN 

1986 KCAND = KCAND + 1 

1988 ACAND (KCAND) = A (I) 

1990 IWCAND (KCAND) = IW(I) 

1992 END IF 
1994 NEXT I 
1996 NN = KCAND 

1998 IWREST = IWREST + IWLEFT + IWMID 

2000 END IF 

2002 END IF 

2004 FOR I = 1 TO NN 

2006 A(I) = ACAND(I) 

2008 IW(I) = IWCAND (I) 

2010 NEXT I 
2012 GOTO 1918 
2014 REM 

2016 END FUNCTION 
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